Texture analysis of magnetic resonance image to differentiate benign from malignant myxoid soft tissue tumors: A retrospective comparative study

It is important to differentiate between benign and malignant myxoid tumors to establish the treatment plan, determine the optimal surgical extent, and plan postoperative surveillance, but differentiation may be complicated by imaging-feature overlap. Texture analysis is used for quantitative assessment of imaging characteristics based on mathematically calculated pixel heterogeneity and has been applied to the discrimination of benign from malignant soft tissue tumors (STTs). In this study, we aimed to assess the diagnostic value of the texture features of conventional magnetic resonance images for the differentiation of benign from malignant myxoid STTs. Magnetic resonance images of 39 patients with histologically confirmed myxoid STTs of the extremities were analyzed. Qualitative features were assessed and compared between the benign and malignant groups. Texture analysis was performed, and texture features were selected based on univariate analysis and Fisher’s coefficient. The diagnostic value of the texture features was assessed using receiver operating curve analysis. T1 heterogeneity showed a statistically significant difference between benign and malignant myxoid STTs, with substantial inter-reader reliability. The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of T1 heterogeneity were 55.6%, 83.3%, 88.2%, 45.5%, and 64.1%, respectively. Among the texture features, T2w-WavEnLL_s-3 showed good diagnostic performance, and T2w-WavEnLL_s-4 and GeoW4 showed fair diagnostic performance. The logistic regression model including T1 heterogeneity and T2_WavEnLL_s-4 showed good diagnostic performance. However, there was no statistically significant difference between the overall qualitative assessment by a radiologist and the predictor model. Geometry-based and wavelet-derived texture features from T2-weighted images were significantly different between benign and malignant myxoid STTs. However, the texture features had a limited additive value in differentiating benign from malignant myxoid STTs.


Introduction
Myxoid soft tissue tumors (STTs) are a heterogeneous group of mesenchymal tumors characterized by an abundant production of extracellular myxoid stroma [1][2][3][4]. The myxoid stroma tends to trap water molecules, resulting in the high-water content of myxoid STTs. Myxoid tumors share a common magnetic resonance imaging (MRI) feature of low signal intensity on T1-weighted images (T1WIs) and high signal intensity on T2-weighted images (T2WIs) [5,6]. Despite the common histologic and imaging features, the biological behavior of these tumors varies widely, from benign to malignant.
The differentiation between benign and malignant myxoid tumors is crucial for establishing a proper treatment plan, determining the optimal surgical extent, and planning postoperative surveillance. Several imaging features, including tumor size, shape or margin, presence of necrosis, edema, and homogeneity of T1 signal intensity, are known to aid in the differentiation between benign and malignant STTs [7][8][9][10]. However, there is an overlap of imaging features, making the differentiation difficult. This is particularly relevant for myxoid STTs, due to the common imaging features resulting from the high-water content.
Several studies have attempted to identify the imaging features differentiating benign from malignant myxoid tumors. Harish et al. [11] suggested that the diagnosis of malignancy was favored when the lesion exhibited a larger average dimension and heterogeneity on T1WIs. Crombe et al. [12] reported that the differentiation between benign and malignant lesions could be reproducibly achieved using conventional MR features, including ill-defined tumor margins, intratumoral hemorrhage or fat, fibrosis, and presence of the tail sign. Other studies have emphasized the importance of certain imaging signs in the diagnosis of myxoid STTs: the "shiny cap" sign for intramuscular myxomas [13], the "target sign" in peripheral nerve sheath tumors [14], and the "tail sign" in myxofibrosarcomas and undifferentiated sarcomas [15,16]. However, most of the suggested findings are qualitative imaging features, which are limited by their subjectivity.
Texture analysis is a novel imaging tool for quantitative assessment of imaging characteristics based on mathematically calculated pixel heterogeneity. In recent decades, many studies have reported promising results of texture analysis in the differentiation and prognosis of tumors of the brain, lung, liver, and soft tissue. Several studies have evaluated the use of texture analysis for discriminating benign from malignant STTs [8,17,18] and cartilaginous bone tumors [19]. We hypothesized that texture analysis of conventional MR images of myxoid STTs may aid in the differentiation of benign from malignant myxoid STTs.
Thus, the purpose of this study was to assess the diagnostic value of texture features of conventional MR images for differentiating between benign and malignant myxoid STTs of the musculoskeletal system and to evaluate their additive value to the qualitative assessment of MR images.

Materials and methods
The Institutional Review Board of Seoul National University Bundang Hospital (IRB No. B-1910-568-102) approved this study and the requirement for informed consent was also waived by the Institutional Review Board of Seoul National University Bundang Hospital. Data for the study was retrospectively obtained and anonymized, posing no more than minimal risk to the study subjects. All data were anonymized before analysis and all methods were performed in accordance with the relevant guidelines and regulations.

Study population
We searched our electronic medical records to identify patients with histologically confirmed myxoid STTs of the extremities who had undergone preoperative MRI between January 2010 and December 2018. The following myxoid STTs were included in our search: myxoma, myxolipoma, acral fibromyxoma, fibromyxoid tumor, myxoid liposarcoma, fibromyxoid sarcoma, myxoinflammatory fibroblastic sarcoma, myxofibrosarcoma, myxosarcoma, and extraskeletal myxoid chondrosarcoma. This search revealed 81 patients.
Among the 97 patients, we excluded those who met any of the following exclusion criteria: 1) history of previous surgery (n = 6); 2) inadequate MR image quality due to low resolution or artifacts (n = 18); 3) MR images acquired with MR systems with a magnetic field strength of 1.5 T or lower (n = 15); and 4) insufficient pulse sequences for evaluation (n = 19).

Qualitative assessment of MRI features
Two readers (a radiologist with 8 years of experience in musculoskeletal radiology, and a radiologist in musculoskeletal radiology fellowship training) independently evaluated the MR images using a picture archiving communication system (INFINITT; Infinitt Healthcare, Seoul, Korea). The readers were aware of the patients' clinical information, including age, sex, and clinical history before surgery, but were blinded to the histological diagnosis.
In addition, each reader was asked to determine whether the STTs were benign or malignant, considering the above-mentioned MRI characteristics (overall assessment).

Texture analysis
For each lesion, the radiologist in fellowship training selected a single axial image section that best represented the characteristics of the tumor both on T1WIs and T2Wis (Figs 1 and 2), and a single region of interest (ROI) was drawn manually along the tumor border (Figs 1C and 2C). In case of suboptimal image quality of the axial images resulting from motion artifacts from pulsating vessels, coronal or sagittal images were used for ROI selection. The second, experienced radiologist reviewed and confirmed the selected free-hand ROIs.
For texture analysis of myxoid STTs, we used the software package MaZda 4.6 (Institute of Electronics, Technical University of Lodz, Poland; available at http://www.eletel.p.lodz.pl/ mazda/), which can calculate more than 300 texture features [20][21][22]. The signal intensity of each ROI was normalized prior to the computation of textural features, using the limitation of dynamics to μ ± 3 σ (μ, gray-level mean; σ, gray-level standard deviation) to reduce dependency of higher order features on first-order gray-level distribution.
The following geometry and texture features were computed for each ROI: geometry features (horizontal and vertical coordinate of gravity center, maximal diameter, perimeter, etc.), gray-level histogram features (mean, variance, skewness, kurtosis, percentiles 1%, 10%, 50%,  90%, and 99%); absolute gradient features (mean, variance, skewness, kurtosis, and percentage of pixels with nonzero); the co-occurrence matrix (angular second moment, contrast, correlation, sum of squares, inverse difference moment, sum average, sum variance, sum entropy, entropy, difference variance, and difference entropy), which was computed for five betweenpixels distances (1-5) and for four directions (horizontal, vertical, 45 degrees, and 135 degrees); the run-length matrix (run-length nonuniformity, gray-level nonuniformity, longrun emphasis, short-run emphasis, and fraction of image in runs), which was computed for four directions (horizontal, vertical, 45 degrees, and 135 degrees); the autoregressive model (Theta: model parameter vector, four features; Sigma: standard deviation of the driving noise); the wavelet-derived features (wavelet energy), which was computed at five scales within four frequency bands {low-low(LL), low-high(LH), high-low(HL), and high-high(HH)}.
The geometry and texture features that were among the top 10 features selected based on Fisher's coefficient and showed a significant between-group difference in the univariate analysis were selected for further analysis.

Statistical analysis
Categorical variables were compared using either the chi-square test or Fisher's exact test. Continuous variables were tested for normality using the Shapiro-Wilk test, and the groups were compared using either the independent Student's t-test or the rank sum test, as appropriate. For texture features showing a significant difference between the benign and malignant groups, receiver operating curve (ROC) analysis was performed to determine their diagnostic value.
To create a model for prediction of malignancy, logistic regression analysis was performed including MRI findings and texture features. ROC comparison was performed by radiologists to compare the diagnostic performance of the logistic regression model with that of the overall qualitative assessment using the method suggested by DeLong et al. Statistical analyses were performed using IBM SPSS Statistics v.21.0 (IBM Corp., Armonk, NY, USA) and STATA (v.14.0; Stata, College Station, TX, USA). A p-value of <0.05 was considered statistically significant.

Results
A total of 39 patients (21 men and 18 women) with a mean age of 54.8±13.8 years (range, 19-87 years) were included in this study. The demographic features, histologic diagnoses, and lesion locations are listed in Table 1.

Qualitative assessment of MRI features
The qualitative MRI features for discriminating benign from malignant myxoid STTs and their inter-reader reliability are presented in Table 2. Among them, only T1 heterogeneity showed a statistically significant difference between benign and malignant myxoid STTs (p = 0.024) in the univariate analysis, and the inter-reader reliability was substantial (κ = 0.679). The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy for discriminating benign from malignant myxoid STTs were 55.6%, 83.3%, 88.2%, 45.5%, and 64.1%, respectively, for T1 heterogeneity and 88.9%, 50.0%, 80.0%, 66.6%, and 76.9%, respectively, for the overall assessment.

Texture analysis
The selected geometry and texture features of T1-and T2WIs are listed in Table 3, along with their diagnostic performance. Texture features were selected according to the results of the univariate analysis and their ranking based on Fisher's coefficient. The selected features

Combined conventional and logistic regression model
The logistic regression model was constructed by including the selected morphologic and texture analysis features with the best diagnostic performance for differentiating benign from malignant myxoid STTs. Among all, the model including T1 heterogeneity and T2_WavEnLL_s-4 showed the highest AUC value and good diagnostic performance (AUC, 0.833; 95% confidence interval, 0.688-0.978; Table 4). However, there was no statistical difference in the AUC values between the overall assessment by a radiologist based on qualitative MR features and the combined model including T1 heterogeneity and T2_WavEnLL_s-4 (p = 0.085).

Discussion
Differentiation between benign and malignant myxoid STTs is often challenging because of their common imaging features resulting from the myxoid stroma. In this study, we focused on myxoid STTs and found that among various texture features derived from conventional T1-and T2WIs, geometry-based and wavelet-derived features showed a significant difference between benign and malignant myxoid STTs. However, they had no significant additive value in differentiating benign from malignant myxoid STTs relative to the overall qualitative assessment by a radiologist. Numerous studies have addressed the differentiation between benign and malignant STTs [7,9,17,[25][26][27]. Most studies have emphasized the importance of lesion size, margin irregularity, and heterogeneity of lesion signal intensity on either T1-or T2WIs, and others have attempted a more objective approach with texture analysis [17,26]. However, few studies have focused on the differentiation of myxoid tumors [11,12,18,28]. A recent study by Martin-Carreras et al [29] have shown that radiomic features from MRI are helpful in differentiating myxomas from myxofibrosarcomas, along with the T1-weighted signal intensity and volume of the lesion.
In our study, heterogeneity of T1 signal intensity was the sole qualitative imaging feature showing a significant difference between benign and malignant myxoid STTs. This finding coincides with that of Harish et al. [11], who found that heterogeneity of the lesion on T1WIs was an imaging feature favoring the diagnosis of malignancy in STTs with "cyst-like" appearance. However, in that study, a larger average dimension of the mass was statistically the most significant predictor of malignancy, which was not the case in our study. This discrepancy may have been caused by the difference in the inclusion criteria. Namely, Harish et al. included lesions based on their signal intensity, and the benign group included ganglion and bursa lesions, as well as myxomas and schwannomas. In our study, we did not include overtly benign lesions with typical imaging features, such as ganglion and bursa lesions or schwannomas; we only included those benign lesions that may pose a diagnostic challenge. As a result, there was no statistically significant difference in the lesion size between the benign and malignant groups.
In the present study, imaging features such as lesion margin, intratumoral hemorrhage, necrosis, or peritumoral features, including peritumor edema and fascial tail sign, did not show a significant difference between benign and malignant lesions. This is conflicting with the findings of Crombe et al. [12], who showed that ill-defined margins, intratumoral fat, a hemorrhagic component, fibrosis, and the "tail sign" were associated with malignancy. The majority of both benign and malignant lesions in our study showed a well-defined border, which was also found in the study by Harish et al [11]. This shows that the assessment of qualitative imaging features may be subjective and limited in terms of reproducibility, emphasizing the need for an objective, quantitative assessment of imaging findings. Hence, we applied texture analysis for a more objective assessment of the intratumoral signal intensities on T1-and T2WIs.
Several texture features derived from geometry and wavelet energy of T1WIs and T2WIs showed a significant difference between benign and malignant myxoid tumors. However, these differences were small and showed low Fisher's coefficients, indicating a low discriminating value. These results are similar to the findings reported in previous studies. Mayerhoefer et al. [30] investigated the differentiation of benign from malignant STTs on MRI by means of texture analysis. They included a heterogeneous group of benign non-neoplastic, benign neoplastic, and malignant lesions, and found that only two texture features derived from the graylevel histogram of short-tau inversion recovery sequences were able to discriminate between benign and malignant lesions. Contrarily, Juntu et al. [17] applied machine learning to the texture analysis features of T1WIs and concluded that it is a potentially valuable tool for the differentiation between malignant and benign STTs. In their study, machine learning classifiers based on texture features showed an accuracy of up to 93%, which was higher than the radiologist classification accuracy of 90%. However, most of the benign tumors included in their study seldom pose a diagnostic problem: lipomas, schwannomas, and cavernous hemangiomas. In our study, the overall assessment by a radiologist based on qualitative imaging features showed a diagnostic accuracy of 69.4%. The relatively low diagnostic accuracy probably resulted from the inclusion of only diagnostically challenging cases in the benign lesion group. Furthermore, both the individual texture features and the combined logistic model failed to show a significant increase in the diagnostic performance compared with the overall assessment by the radiologist. Texture features derived from conventional MR images seem to have a limited additive value in the differentiation of benign from malignant myxoid STTs in cases that are diagnostically challenging to radiologists.
There are some limitations in our study. First, it was performed retrospectively and only included a small number of histologically confirmed cases. Thus, future prospective studies with a larger sample size are warranted. Second, because of the retrospective design, the MRI protocol varied. The non-standardized imaging parameters (TR, TE, and pixel size) may have affected the texture features. Further studies with a more standardized imaging protocol may be needed. Third, contrast-enhanced images were not included. We excluded contrastenhanced images from our analysis because the amount of administered contrast medium and the time interval between contrast medium injection and image acquisition varied from case to case, which would have affected the texture features. Finally, we used a two-dimensional approach for texture analysis. As complete three-dimensional volume data were not included, the results could vary depending on the image slice selected for analysis.

Conclusions
In conclusion, geometry-based and wavelet-derived texture features from T2WIs showed a significant difference between benign and malignant myxoid STTs. However, the texture features of T1-and T2WIs had a limited additive value in differentiating benign from malignant myxoid STTs relative to the overall qualitative image assessment by radiologists.
Supporting information S1